Importance Sampling Estimates for Policies with Memory
نویسنده
چکیده
Importance sampling has recently become a popular method for computing off-policy Monte Carlo estimates of returns. It has been known that importance sampling ratios can be computed for POMDPs when the sampled and target policies are both reactive (memoryless). We extend that result to show how they can also be efficiently computed for policies with memory state (finite state controllers) without resorting to the standard trick of pretending the memory is part of the environment. This allows for very dataefficient algorithms. We demonstrate the results on simulated problems.
منابع مشابه
Approximating Bayes Estimates by Means of the Tierney Kadane, Importance Sampling and Metropolis-Hastings within Gibbs Methods in the Poisson-Exponential Distribution: A Comparative Study
Here, we work on the problem of point estimation of the parameters of the Poisson-exponential distribution through the Bayesian and maximum likelihood methods based on complete samples. The point Bayes estimates under the symmetric squared error loss (SEL) function are approximated using three methods, namely the Tierney Kadane approximation method, the importance sampling method and the Metrop...
متن کاملMulti-step Off-policy Learning Without Importance Sampling Ratios
To estimate the value functions of policies from exploratory data, most model-free offpolicy algorithms rely on importance sampling, where the use of importance sampling ratios often leads to estimates with severe variance. It is thus desirable to learn off-policy without using the ratios. However, such an algorithm does not exist for multi-step learning with function approximation. In this pap...
متن کاملPolicy Improvement for POMDPs Using Normalized Importance Sampling
We present a new method for estimating the expected return of a POMDP from experi ence. The estimator does not assume any knowledge of the POMDP, can estimate the returns for finite state controllers, allows ex perience to be gathered from arbitrary se quences of policies, and estimates the return for any new policy. We motivate the estima tor from function-approximation and impor tance sa...
متن کاملOn the use of likelihood ratio as indicator of the accuracy of importance sampling estimates
This paper presents some observations made from experimenting with the use of importance sampling on large and small systems. The key point is to develop heuristics that enables the use of importance sampling even when the biasing strategy cannot be proven to be optimal or produce estimates with bounded relative error. The main observation is that the likelihood ratio and its relative error see...
متن کاملLocal Adaptive Importance Sampling for Multivariate Densities with Strong Nonlinear Relationships
We consider adaptive importance sampling techniques which use kernel density estimates at each iteration as importance sampling functions. These can provide more nearly constant importance weights and more precise estimates of quantities of interest than the SIR algorithm when the initial importance sampling function is di use relative to the target. We propose a new method which adapts to the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001